16 research outputs found
Survey of Machine Learning Techniques for Malware Analysis
Coping with malware is getting more and more challenging, given their
relentless growth in complexity and volume. One of the most common approaches
in literature is using machine learning techniques, to automatically learn
models and patterns behind such complexity, and to develop technologies for
keeping pace with the speed of development of novel malware. This survey aims
at providing an overview on the way machine learning has been used so far in
the context of malware analysis. We systematize surveyed papers according to
their objectives (i.e., the expected output, what the analysis aims to), what
information about malware they specifically use (i.e., the features), and what
machine learning techniques they employ (i.e., what algorithm is used to
process the input and produce the output). We also outline a number of problems
concerning the datasets used in considered works, and finally introduce the
novel concept of malware analysis economics, regarding the study of existing
tradeoffs among key metrics, such as analysis accuracy and economical costs
Android Malware Family Classification Based on Resource Consumption over Time
The vast majority of today's mobile malware targets Android devices. This has
pushed the research effort in Android malware analysis in the last years. An
important task of malware analysis is the classification of malware samples
into known families. Static malware analysis is known to fall short against
techniques that change static characteristics of the malware (e.g. code
obfuscation), while dynamic analysis has proven effective against such
techniques. To the best of our knowledge, the most notable work on Android
malware family classification purely based on dynamic analysis is DroidScribe.
With respect to DroidScribe, our approach is easier to reproduce. Our
methodology only employs publicly available tools, does not require any
modification to the emulated environment or Android OS, and can collect data
from physical devices. The latter is a key factor, since modern mobile malware
can detect the emulated environment and hide their malicious behavior. Our
approach relies on resource consumption metrics available from the proc file
system. Features are extracted through detrended fluctuation analysis and
correlation. Finally, a SVM is employed to classify malware into families. We
provide an experimental evaluation on malware samples from the Drebin dataset,
where we obtain a classification accuracy of 82%, proving that our methodology
achieves an accuracy comparable to that of DroidScribe. Furthermore, we make
the software we developed publicly available, to ease the reproducibility of
our results.Comment: Extended Versio
Towards a Near-real-time Protocol Tunneling Detector based on Machine Learning Techniques
In the very last years, cybersecurity attacks have increased at an
unprecedented pace, becoming ever more sophisticated and costly. Their impact
has involved both private/public companies and critical infrastructures. At the
same time, due to the COVID-19 pandemic, the security perimeters of many
organizations expanded, causing an increase of the attack surface exploitable
by threat actors through malware and phishing attacks. Given these factors, it
is of primary importance to monitor the security perimeter and the events
occurring in the monitored network, according to a tested security strategy of
detection and response. In this paper, we present a protocol tunneling detector
prototype which inspects, in near real time, a company's network traffic using
machine learning techniques. Indeed, tunneling attacks allow malicious actors
to maximize the time in which their activity remains undetected. The detector
monitors unencrypted network flows and extracts features to detect possible
occurring attacks and anomalies, by combining machine learning and deep
learning. The proposed module can be embedded in any network security
monitoring platform able to provide network flow information along with its
metadata. The detection capabilities of the implemented prototype have been
tested both on benign and malicious datasets. Results show 97.1% overall
accuracy and an F1-score equals to 95.6%.Comment: 12 pages, 4 figures, 4 table
Share a pie? Privacy-preserving knowledge base export through count-min sketches
Knowledge base (KB) sharing among parties has been proven to be beneficial in several scenarios. However such sharing can arise considerable privacy concerns depending on the sensitivity of the information stored in each party’s KB.
In this paper, we focus on the problem of exporting a (part of a) KB of a party towards a receiving one. We introduce a novel solution that enables parties to export data in a privacy-preserving fashion, based on a probabilistic data structure, namely the count-min sketch. With this data structure, KBs can be exported in the form of key-value stores and inserted into a set of count-min sketches, where keys can be sensitive and values are counters. Count-min sketches can be tuned to achieve a given key collision probability, which enables a party to deny having certain keys in its own KB, and thus to preserve its privacy. We also introduce a metric, the Îł-deniability (novel for count-min sketches), to measure the privacy level obtainable with a count-min sketch. Furthermore, since the value associated to a key can expose to linkage attacks, noise can be added to a count-min sketch to ensure controlled error on retrieved values. Key collisions and noise alter the values contained in the exported KB, and can affect negatively the accuracy of a computation performed on the exported KB. We explore the tradeoff between privacy preservation and computation accuracy by experimental evaluations in two scenarios related to malware detection
An architecture for semi-automatic collaborative malware analysis for CIs
Critical Infrastructures (CIs) are among the main targets of activists, cyber terrorists and state sponsored attacks. To protect itself, a CI needs to build and keep updated a domestic knowledge base of cyber threats. It cannot indeed completely rely on external service providers because information on incidents can be so sensible to impact national security. In this paper, we propose an architecture for a malware analysis framework to support CIs in such a challenging task. Given the huge number of new malware produced daily, the architecture is designed so as to automate the analysis to a large extent, leaving to human analysts only a small and manageable part of the whole effort. Such a non-automatic part of the analysis requires a wide range of expertise, usually contributed by more analysts. The architecture enables analysts to work collaboratively to improve the understanding of samples that demand deeper investigations (intra-CI collaboration). Furthermore, the architecture allows to share partial and configurable views of the knowledge base with other interested CIs, in order to collectively obtain a more complete vision of the cyber threat landscape (inter-CI collaboration)
Towards the usage of invariant-based app behavioral fingerprinting for the detection of obfuscated versions of known malware
App fingerprints can be used to verify whether two apps are the same, and are useful tools for malware detection because they can allow to recognize obfuscated versions of known malware. Fingerprinting an app on the base of static features is known to fail against obfuscation, as it is successful in hiding the static characteristics that reveal the malicious nature of an app. In this paper we propose a novel way to compute app fingerprints, which is based on behavioral features. The aim is to capture the semantics of the app, so that obfuscation results ineffective. The technique we introduce exploits invariants, found among pairs of metrics, collected during app execution, and produces a fingerprint consisting of the list of the correlation values of these pairs. We present an experimental evaluation carried out on a real Android device, whose obtained results support the methodology we propose, and show it can be a viable research direction to investigate further
Ultrasound evaluation of the uterus in the uncomplicated postpartum period: a systematic review
The aim of this systematic review and meta-analysis was to define the means and the upper limits of normal for endometrial thickness and uterine measurements in uncomplicated pregnancies at different postpartum periods
The goods, the bads and the uglies: Supporting decisions in malware detection through visual analytics
Malware associated with Web downloads is responsible for many attacks trying to execute malicious code on a remote machine. Web browsers are protected by anti-malware utilities that try to distinguish between good downloads and bad downloads, blocking the bad ones and alerting the user. In order to cope with the uncertainty of such a process, very often the final decision is made using suitable thresholds, giving rise to a 3 categories classification: good downloads, bad downloads, and “in the middle” downloads (i.e., the uglies). In this situation, it is possible to involve the user (e.g., the security manager) in the decision loop, presenting him with the details of the decision process in a way he can either be more confident about the system decisions or he can refine what has been done automatically, e.g., promoting an ugly download to a good one. The paper addresses this problem presenting a visual analytics solution supporting the analysis of the classification system presented in AMICO [24], providing the user with a better understanding of the classification decisions and the possibility of changing the classification results. A prototype is available at: http://awareserver.dis.uniroma1.it:11768/malvis/